New algorithms for finding approximate frequent item sets

نویسندگان

  • Christian Borgelt
  • Christian Braune
  • Tobias Kötter
  • Sonja Grün
چکیده

In standard frequent item set mining a transaction supports an item set only if all items in the set are present. However, in many cases this is too strict a requirement that can render it impossible to find certain relevant groups of items. By relaxing the support definition, allowing for some items of a given set to be missing from a transaction, this drawback can be amended. The resulting item sets have been called approximate, fault-tolerant or fuzzy item sets. In this paper we present two new algorithms to find such item sets: the first is an extension of item set mining based on cover similarities and computes and evaluates the subset size occurrence distribution with a scheme that is related to the Eclat algorithm. The second employs a clustering-like approach, in which the distances are derived from the item covers with distance measures for sets or binary vectors and which is initialized with a one-dimensional Sammon projection of the distance matrix. We demonstrate the benefits of our algorithms by applying them to a concept detection task on the 2008/2009 Wikipedia Selection for schools and to the neurobiological task of detecting neuron ensembles in (simulated) parallel spike trains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach for finding Frequent Item Sets with Hybrid Strategies

Frequent item sets mining plays an important role in association rules mining. Over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. Therefore, a number of methods have been proposed recently to discover approximate frequent item sets. This paper proposes an efficient SMine (Sorted Mine) Algorithm for finding frequent ite...

متن کامل

Comparison of Frequent Item Set Mining Algorithms

Frequent item sets mining plays an important role in association rules mining. Over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. The main focus of this paper is to analyze the implementations of the Frequent item set Mining algorithms such as SMine and Apriori Algorithms. General Terms-Data Mining, Frequent Item sets,...

متن کامل

Algorithm for Efficient Multilevel Association Rule Mining

over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. The problems of finding frequent item sets are basic in multi level association rule mining, fast algorithms for solving problems are needed. This paper presents an efficient version of apriori algorithm for mining multi-level association rules in large databases to fi...

متن کامل

An efficient hash based algorithm for mining closed frequent item sets

Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent item sets, and then forming conditional implication rules among them. Efficient algorithms to discover frequent patterns are crucial in data mining research. Finding frequent item sets is computationally the most expensive step i...

متن کامل

A Hybrid GeneticMax Algorithm for Improving the Traditional Genetic Based Approach for Mining Maximal Frequent Item Sets

Mining Frequent item sets is one of the most useful data mining methods which discovers important relationships among attributes of data sets. Initially it was developed for market basket analysis, but these days it is used to solve any task where discovering hidden relationships among different attributes is required. Mining frequent item sets plays a vital role for generating association rule...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Soft Comput.

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2012